The data set was obtained by request from the National Heart, Lung and Blood Institute (NHLBI). In order to maintain anonymity, variables from the orginal dataset were permutated across observations. The original dataset was collected by the Digitalis Investigation Group from over 300 centers in the US and Canada for the purpose of investigating the effect of Digoxin dosage on hospitalization and mortality in patients with heart failure and normal sinus rhythm.
dig.data <- read.csv("dig.csv") %>% tbl_df
# select only females 80 or younger without stroke or diabetes
dig.data.subset <- dig.data %>%
filter(SEX == "2" & AGE <= "80" & DIABETES == "0" & STRK == "0"
& !is.na(CHFDUR) & !is.na(HEARTRTE) & !is.na(DIABP) & !is.na(FUNCTCLS)
& !is.na(PREVMI))
# select only the variables needed
dig.data.subset <- select(dig.data.subset, recordID, TRTMT, AGE,
RACE, EJF_PER, BMI, CHFDUR, NSYM, HEARTRTE,
DIABP, SYSBP, FUNCTCLS, PREVMI,
HYPERTEN, DIGDOSE, WHF, WHFDAYS,
DIG, DEATH, DEATHDAY)
Originally there were 6,800 observations. The chosen subset contains women under the age of 80 with no history of diabetes or stroke. I removed 7 observations with missing values in any of the variables. I also removed variables that I have no intention of using.
dig.data.subset
# A tibble: 984 × 27
recordID TRTMT AGE RACE EJF_PER BMI CHFDUR NSYM HEARTRTE DIABP
<int> <int> <int> <int> <int> <dbl> <int> <int> <int> <int>
1 3 0 72 1 36 25.530 12 4 91 70
2 6 0 69 2 45 27.770 84 4 64 76
3 7 1 64 1 30 31.694 31 1 102 90
4 10 1 64 1 24 28.697 33 4 112 80
5 13 0 66 1 42 32.802 1 4 72 78
6 27 0 65 2 34 27.673 22 4 85 80
7 31 1 74 1 30 24.691 84 2 81 70
8 32 1 77 1 41 25.113 83 4 76 68
9 41 0 71 2 34 24.314 24 4 88 60
10 47 0 74 1 38 34.330 25 4 76 70
# ... with 974 more rows, and 17 more variables: SYSBP <int>,
# FUNCTCLS <int>, PREVMI <int>, HYPERTEN <int>, DIGDOSE <dbl>,
# WHF <int>, WHFDAYS <int>, DIG <int>, DEATH <int>, DEATHDAY <int>,
# TRTMT.F <fctr>, PREVMI.F <fctr>, RACE.F <fctr>, HYPERTEN.F <fctr>,
# WHF.F <fctr>, DIG.F <fctr>, FUNCTCLS.F <fctr>
The data set contains 984 observations with 20 columns including the record ID.
| Variable | Type | Notes |
|---|---|---|
| recordID | Record ID | ID from original dataset |
| TRTMT | Binary | Treatment: 0 = control (51%), 1 = treated (49%) |
| AGE | Quantitative | Age ranging from 28 to 80 in years |
| RACE | Binary | 1 = white (81%), 2 = non-white (19%) |
| EJF_PER | Quantitative | Ejection Fraction percentage [6, 45] |
| BMI | Quantitative | Body Mass Index (kg/M*M) [16, 56] |
| CHFDUR | Integer | Congestive Heart Failure (CHF) duration months [0, 300] |
| NSYM | Integer | Number of Symptoms of CHF [0, 4] |
| HEARTRTE | Quantitative | Heart Rate (BPM) [44, 132] |
| DIABP | Quantitative | Diastolic Blood Pressure [25, 150] |
| SYSBP | Quantitative | Systolic Blood Pressure [80, 190] |
| FUNCTCLS | Categorical | NYHA Functional Class: I (8.6%), II (49.4%), III (39.7%), IV (2.3%) |
| PREVMI | Binary | Previous MI: 0 = No (37%), 1 = Yes (63%) |
| HYPERTEN | Binary | History of Hypertension: 0 = No (54.7%), 1 = Yes (45.3%) |
| DIURET | Binary | Diuretic Use: 0 = No (15%), 1 = Yes (85%) |
| DIGDOSE | Quantitative | Digoxin/Placebo Dosage Prescribed [0, 0.5] |
| WHF | Binary Outcome | Worsening Heart Failure (WHF): 0 = No (67%), 1 = Yes (33%) |
| WHFDAYS | Quantitative Outcome | Days since randomization to Hospitalization for WHF [0, 1770] |
| DIG | Binary Outcome | Digoxin Toxicity: 0 = No (98%), 1 = Yes (2%) |
| DEATH | Binary Outcome | Death: 0 = No (68%), 1 = Yes (32%) |
| DEATHDAY | Quantitative Outcome | Days since randomization to Death [1, 1770] |
Patients with heart failure, normal sinus rhythm, and left ventricular ejection fractions of less than or equal to 0.45 were included in the trial. This subset of the data includes only women with no history of diabetes or stroke. The patients were taken from 300 centers in the United States and Canada between February 1991 and September 1993 with follow ups until December of 1995.
recordID: Original Record ID for patients.TRMT: Binary categorical variable indicating whether or not the patient got the digoxin treatment or not. A value of 0 indicates not treated.AGE: Patient age in years at the time of randomization.RACE: Patient race, either White (1) or Non-white (2).EJF_PER: Patient Ejection Fraction expressed as a percentage. This gives a measure of how much blood leaves the heart with each contraction. Higher is better.BMI: Body Mass Index, height to mass ratio used to indicate obesity.CHFDUR: Congestive Heart Failure Duration, the number of months patient has been in congestive heart failure prior to randomizationNSYM: Number of Symptoms of Congestive Heart Failure, the sum of 8 possible symptoms including: Rales, Elevated Jugular Venous Pressure, Peripheral Edema, Dyspnea at rest, Dyspnea on exertion, Limitation of activity, S3, Radiologic evidence of congestion. If four or more symptoms were noted, the value was given a 4.HEARTRTE: Patient heartrate in beats per minute prior to randomization.DIABP: Patient diastolic blood pressure (mm Hg) prior to randomization.SYSBP: Patient sytolic blood pressure (mm Hg) prior to randomization.FUNCTCLS: New York Heart Association Functional Class. Categorical variable ranging from class I - IV that assesses patient’s ability to perform physical activity. Higher class numbers indicate more severe physical limitations.PREVMI: Previous myocardial infarction (1) or not (0).HYPERTEN: Whether the patient has a history of hypertension (1) or not (0).DIURET: Whether the patient is on non-potassium sparing diuretics (1) or not (0).DIGDOSE: Prescribed dosage of digoxin in miligrams/day.WHF: Worsening Heart Failure. If the patient was hospitalized, did the symptoms of heart failure increase?WHFDAYS: Number of days since randomization until hospitalization for worsening heart failure.DIG: Whether the patient exhibited digoxin toxicity (1) or not (0).DEATH: Whether the patient died (1) or not (0).DEATHDAY: Number of days since randomization until death or last contact date if alive..F denotes the corresponding factor variableThe quantiative outcome for the linear regression model will be WHFDAYS or the days since randomization until hospitalization for worsening heart failure with the following predictors:
The binary outcome for the logistic regression model will be mortality with the following predictors:
The data set meets all of the requirements specified in the project instructions. I am certain that it is completely appropriate for these data to be shared with anyone, without any conditions. There are no concerns about privacy or security.
Predict the number of days since randomization until hospitalization for worsening heart failure in patients on digoxin versus the control.
ggplot(dig.data.subset.training, aes(x = WHFDAYS)) +
geom_histogram(fill = "hotpink", col = "white") +
labs(title = "Untransformed Continuous Outcome",
x = "Worsening Heart Failure Days",
subtitle = "Days since randomization to Hospitalization for Worsening HF") +
theme_classic()
The outcome clusters around 50 and again around 1300 days, giving a bimodal distribution. This could prove challening to fit a linear model. The data are reasonably symmetrical, however, so I will not transform.
plot(spearman2(WHFDAYS ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F + EJF_PER + BMI + CHFDUR + NSYM + HEARTRTE + PREVMI.F + HYPERTEN.F,
data = dig.data.subset))
The Spearman plot suggests that if a cubic spline will be used in the model, it should be fit to the ejection fraction percentage (continuous variable), perhaps with an interaction term including functional class.
# use all the variables with ordinary least squares
# include a cubic spline with interaction terms
dd <- datadist(dig.data.subset.training)
options(datadist = "dd")
model.ks <- ols(WHFDAYS ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F + rcs(EJF_PER,3) + BMI + CHFDUR + NSYM + HEARTRTE + PREVMI.F + HYPERTEN.F,
data = dig.data.subset.training, x = TRUE, y = TRUE)
model.ks
Linear Regression Model
ols(formula = WHFDAYS ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE +
RACE.F + rcs(EJF_PER, 3) + BMI + CHFDUR + NSYM + HEARTRTE +
PREVMI.F + HYPERTEN.F, data = dig.data.subset.training, x = TRUE,
y = TRUE)
Model Likelihood Discrimination
Ratio Test Indexes
Obs 738 LR chi2 48.97 R2 0.064
sigma516.5181 d.f. 13 R2 adj 0.047
d.f. 724 Pr(> chi2) 0.0000 g 151.956
Residuals
Min 1Q Median 3Q Max
-1146.23 -443.45 83.94 412.63 1179.47
Coef S.E. t Pr(>|t|)
Intercept 1265.0856 284.8089 4.44 <0.0001
AGE -3.2119 1.8237 -1.76 0.0786
FUNCTCLS -131.1723 29.6707 -4.42 <0.0001
TRTMT.F=DIG 38.9697 38.2243 1.02 0.3083
DIGDOSE 213.0461 264.9100 0.80 0.4215
RACE.F=Non-White -175.8059 48.2123 -3.65 0.0003
EJF_PER 2.6748 5.1547 0.52 0.6040
EJF_PER' 2.6127 6.1917 0.42 0.6732
BMI -1.8320 3.6590 -0.50 0.6167
CHFDUR -0.6348 0.5147 -1.23 0.2179
NSYM -16.7393 27.0333 -0.62 0.5360
HEARTRTE 1.7728 1.5602 1.14 0.2562
PREVMI.F=Yes -17.9434 39.9525 -0.45 0.6535
HYPERTEN.F=Yes 29.0962 38.4111 0.76 0.4490
rms::vif(model.ks)
AGE FUNCTCLS TRTMT.F=DIG DIGDOSE
1.016157 1.063855 1.009358 1.011247
RACE.F=Non-White EJF_PER EJF_PER' BMI
1.020382 5.822272 5.709028 1.021325
CHFDUR NSYM HEARTRTE PREVMI.F=Yes
1.013138 1.011802 1.014537 1.022787
HYPERTEN.F=Yes
1.011665
The variables NYHA functional class and race were signficant in this model. Age was significant to the 90% level. From this output, older patient age, higher functional class (more physical impairment), and non-white status were associated with a shorter timeframe to hospitalization. (Remember, we want more days until hospitalization for heart failure.) Interestingly, neither the treatment (digoxin) nor the dosage were significant in this model. Note the abysmal \(R^2\) value of 0.064 and the \(R_{adj}\) of 0.047; perhaps the model is over-fitted.
The variance inflation factors are all very close to 1 (and certainly below 5, with the exception of the nonlinear terms); collinearity is not an issue here.
# plot the anova values of the variables
plot(anova(model.ks))
The anova plot suggests that only Functional Class and Race were significant at the 5% level, while Age and Ejection fraction were significant at the 10% level.
plot(summary(model.ks), main="")
The odds ratio plot for the kitchen sink model shows that AGE, FUNCTCLS, EJF_PER and RACE.F have significant effects on the outcome.
plot(nomogram(model.ks))
plot(calibrate(model.ks))
n=738 Mean absolute error=35.706 Mean squared error=2031.191
0.9 Quantile of absolute error=78.51
This calibration plot is quite nonlinear, suggesting that this model does not predict optimally in any regions.
# save the model predictors
predictors <- with(dig.data.subset.training, cbind(AGE, EJF_PER,
BMI, CHFDUR, NSYM, HEARTRTE,
FUNCTCLS, DIGDOSE, TRTMT.F,
PREVMI.F, RACE.F, HYPERTEN.F))
x1 <- regsubsets(predictors, dig.data.subset.training$WHFDAYS, nvmax = 12)
rs <- summary(x1)
rs
Subset selection object
12 Variables (and intercept)
Forced in Forced out
AGE FALSE FALSE
EJF_PER FALSE FALSE
BMI FALSE FALSE
CHFDUR FALSE FALSE
NSYM FALSE FALSE
HEARTRTE FALSE FALSE
FUNCTCLS FALSE FALSE
DIGDOSE FALSE FALSE
TRTMT.F FALSE FALSE
PREVMI.F FALSE FALSE
RACE.F FALSE FALSE
HYPERTEN.F FALSE FALSE
1 subsets of each size up to 12
Selection Algorithm: exhaustive
AGE EJF_PER BMI CHFDUR NSYM HEARTRTE FUNCTCLS DIGDOSE TRTMT.F
1 ( 1 ) " " " " " " " " " " " " "*" " " " "
2 ( 1 ) " " " " " " " " " " " " "*" " " " "
3 ( 1 ) " " "*" " " " " " " " " "*" " " " "
4 ( 1 ) "*" "*" " " " " " " " " "*" " " " "
5 ( 1 ) "*" "*" " " "*" " " " " "*" " " " "
6 ( 1 ) "*" "*" " " "*" " " "*" "*" " " " "
7 ( 1 ) "*" "*" " " "*" " " "*" "*" " " "*"
8 ( 1 ) "*" "*" " " "*" " " "*" "*" "*" "*"
9 ( 1 ) "*" "*" " " "*" " " "*" "*" "*" "*"
10 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" "*"
11 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
12 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
PREVMI.F RACE.F HYPERTEN.F
1 ( 1 ) " " " " " "
2 ( 1 ) " " "*" " "
3 ( 1 ) " " "*" " "
4 ( 1 ) " " "*" " "
5 ( 1 ) " " "*" " "
6 ( 1 ) " " "*" " "
7 ( 1 ) " " "*" " "
8 ( 1 ) " " "*" " "
9 ( 1 ) " " "*" "*"
10 ( 1 ) " " "*" "*"
11 ( 1 ) " " "*" "*"
12 ( 1 ) "*" "*" "*"
Performing a best subsets approach shows that the NYHA functional class is in all models, with ejection fraction in all but two. The instance of a previous myocardial infarction is represented in only 1 out of the 12 models.
pander(t(temp))
| p | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| Radj | 0.027 | 0.044 | 0.048 | 0.051 | 0.052 | 0.052 | 0.052 | 0.052 | 0.051 | 0.051 | 0.05 | 0.048 |
| bic | -7.72 | -15.34 | -13.24 | -9.96 | -4.59 | 0.68 | 6.14 | 12.03 | 18.03 | 24.27 | 30.67 | 37.06 |
| aic.cor | 9235 | 9220 | 9216 | 9213 | 9212 | 9210 | 9209 | 9208 | 9208 | 9208 | 9207 | 9207 |
| Cp | 18.9 | 6.5 | 4 | 2.7 | 3.5 | 4.2 | 5.1 | 6.4 | 7.8 | 9.4 | 11.2 | 13 |
| dif | 16.9 | 3.5 | 0 | -2.3 | -2.5 | -2.8 | -2.9 | -2.6 | -2.2 | -1.6 | -0.8 | 0 |
Models with 6 - 9 predictors (including intercept) had the highest \(R_{adj}\) values, 0.052. Variables typically included were: Age, Ejection Fraction, Functional Class, and Digoxin Dose. Because we want to maximize \(R_{adj}\), while minimizing the BIC, AIC, and \(C_p\), model 4 is looking pretty good. Model 4 includes the intercept, ejection fraction, functional class, and race. See the plot below.
By inspection, the \(C_p\) statistic is best with 4 coefficients (intercept included), however the Adjusted R-squared doesn’t level off until around 6 coefficients. Similarly the Bias-Corrected AIC is still quite large with 4. The BIC looks best with 3 coefficients, however it is only marginally better than the model with 4.
Compare 4 models based on best values for Cp, Corrected AIC, BIC, and Adjusted \(R^2\). The candidate models include:
| Summary | Coefficients | Predictors |
|---|---|---|
| Cp | 4 | Functional Class, Race, Ejection Fraction |
| Corr AIC | 12 | Age, Functional Class, Treatment, Digoxin Dose, Race, BMI, Heart Failure Duration, Number of HF symptoms, Heart rate, Ejection Fraction |
| BIC | 3 | Functional Class, Race |
| Adjusted R2 | 6 | Age, Functional Class, Race, Heart Failure Duration, Ejection Fraction |
# models are named by adding 1 to the number of variables.
lm1 <- lm(WHFDAYS ~ 1, data = dig.data.subset.training)
lm3 <- lm(WHFDAYS ~ FUNCTCLS + RACE.F,
data = dig.data.subset.training)
lm4 <- lm(WHFDAYS ~ FUNCTCLS + RACE.F + EJF_PER,
data = dig.data.subset.training)
lm6 <- lm(WHFDAYS ~ AGE + FUNCTCLS + RACE.F + CHFDUR +
EJF_PER, data = dig.data.subset.training)
lm12 <- lm(WHFDAYS ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F +
BMI + CHFDUR + NSYM + HEARTRTE + EJF_PER +
HYPERTEN.F, data = dig.data.subset.training)
anova(lm12, lm6, lm4, lm3, lm1)
Analysis of Variance Table
Model 1: WHFDAYS ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F + BMI +
CHFDUR + NSYM + HEARTRTE + EJF_PER + HYPERTEN.F
Model 2: WHFDAYS ~ AGE + FUNCTCLS + RACE.F + CHFDUR + EJF_PER
Model 3: WHFDAYS ~ FUNCTCLS + RACE.F + EJF_PER
Model 4: WHFDAYS ~ FUNCTCLS + RACE.F
Model 5: WHFDAYS ~ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 726 193259714
2 732 194406275 -6 -1146561 0.7179 0.63531
3 734 195610306 -2 -1204031 2.2615 0.10492
4 735 196806908 -1 -1196601 4.4952 0.03433 *
5 737 206408522 -2 -9601614 18.0347 2.27e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This ANOVA comparison suggests that I use the model with 3 variables, including functional class, race, and the intercept. However, I will use the model with 4 variables as I am not constrained by degrees of freedom.
m4.ols <- ols(WHFDAYS ~ EJF_PER + FUNCTCLS + RACE.F,
data = dig.data.subset.training, x = TRUE, y = TRUE)
plot(calibrate(m4.ols))
n=738 Mean absolute error=15.751 Mean squared error=630.8489
0.9 Quantile of absolute error=36.814
The model in question: \(WHFDAYS = EJF + FUNCTCLS + RACE.F\)
This model could be worse. Specifically, values within 800 to around 1000 days are reasonably well described while values outside of this range are overpredicted. See coefficients in the table below.
| names | x | conf.low | conf.high |
|---|---|---|---|
| Intercept | 1096 | 883.3 | 1308 |
| EJF_PER | 4.63 | 0.34 | 8.93 |
| FUNCTCLS | -127.3 | -185.2 | -69.46 |
| RACE.F=Non-White | -174.4 | -268.5 | -80.42 |
# plot nomogram for model 4
plot(nomogram(m4.ols))
Use the Lasso method to produce a model.
## Lasso Method
lassopreds <- with(dig.data.subset, cbind(AGE, EJF_PER,
BMI, CHFDUR, NSYM, HEARTRTE,
FUNCTCLS, DIGDOSE, TRTMT.F,
PREVMI.F, RACE.F, HYPERTEN.F))
lasso1 <- lars(lassopreds, dig.data.subset$WHFDAYS, type="lasso")
plot(lasso1)
# plot the Mean Square Errors of to see where the fraction is minimized
set.seed(314159)
lassocv <- cv.lars(lassopreds, dig.data.subset$WHFDAYS, K=12)
# calculate the minimum L1 fraction
minL1 <- lassocv$index[which.min(lassocv$cv)]
minL1
[1] 0.6161616
The mean squared error is minmized at 0.616. This suggests using a model with around 7 predictors - many more than model in 4.
# indentify the Lasso Coefficients
coef.cv <- coef(lasso1, s=minL1, mode="fraction")
Lasso1.nz <- round(coef.cv[c("AGE", "EJF_PER", "CHFDUR", "FUNCTCLS", "DIGDOSE", "TRTMT.F", "RACE.F")], 2)
# Compare the Non Zero coefficients
lm1 <- lm(WHFDAYS ~ AGE + EJF_PER + CHFDUR + FUNCTCLS + DIGDOSE + TRTMT.F + RACE.F,
data = dig.data.subset)
lm1.coef <- round(lm.beta(lm1),3)
pander(rbind(Lasso1.nz, lm1.coef), caption="Compare the Lasso and lm Coefficients")
| Â | AGE | EJF_PER | CHFDUR | FUNCTCLS | DIGDOSE | TRTMT.F | RACE.F |
|---|---|---|---|---|---|---|---|
| Lasso1.nz | -2.11 | 4.96 | -0.06 | -135.3 | 121.8 | 2.49 | -76.02 |
| lm1.coef | -0.063 | 0.1 | -0.027 | -0.191 | 0.04 | 0.026 | -0.078 |
The lasso model suggests using the variables Age, Ejection Fraction, Functional Class, Digoxin Dose, Treatment, Race, and perhaps Heart Failure Duration. Ejection Fraction, Functional Class, and Race fit well with the best subsets model. The coefficients are quite different, suggesting that neither well predicts the model. There are no sign changes, however.
Although model 4 (AGE, FUNCTCLS, RACE.F) leaves a lot to be desired with an \(R^2\) of 0.052, I will use it to see how well this model generalizes to my test data. Look at Mean Square Prediction Error and Mean Absolute Prediction Error.
| Â | MAPE | MSPE | Max Abs. Error |
|---|---|---|---|
| Model 4 | 453.9 | 278500 | 1034 |
| Lasso | 807.1 | 3256361 | 1804 |
| Kitchen Sink | 454.6 | 280700 | 1026 |
While none of these are particularly good, I still like Model 4 the best given that its mean absolute and mean square prediction errors are smallest. The max absolute errors are close enough in both model 4 and kitchen sink that the difference is negligible.
I actually hope that this model doesn’t generalize to the whole dataset as I chose a relatively healthy subset of patients (females under 80 with no history of stroke or diabetes).
# Use m4 with the entire dataset (n=6800)
m4.final <- ols(WHFDAYS ~ EJF_PER + FUNCTCLS + RACE.F,
data = dig.data, x = TRUE, y = TRUE)
# Show coefficients with confidence interval
m4.final.confint <- round(confint_tidy(m4.final), 2)
pander(cbind(tidy(round(m4.final$coefficients, 2)), m4.final.confint),
caption = "Model 4 Generalized Coefficients, 95% CI")
| names | x | conf.low | conf.high |
|---|---|---|---|
| Intercept | 939.6 | 877.3 | 1002 |
| EJF_PER | 8.64 | 7.25 | 10.03 |
| FUNCTCLS | -124.8 | -142.7 | -107 |
| RACE.F=Non-White | -92.52 | -126.9 | -58.15 |
For ease of comparison the coefficients from model 4 applied to both datasets are reproduced below:
| Model | Intercept | EJF_PER | FUNCTCLS | RACE.F |
|---|---|---|---|---|
| Model 4 Training | 1096 [883, 1308] | 4.63 [0.34, 8.93] | -127 [-185, -69.5] | -174 [-269, -80.4] |
| Model 4 General | 940 [877, 1002] | 8.64 [7.25, 10.03] | -125 [-143, -107] | -92.5 [-127, -58.2] |
The coefficients are actually quite similar. EJF_PER and RACE.F show the greatest differences in the estimates, however their 95% CI still overlap.
m4.final
Frequencies of Missing Values Due to Each Variable
WHFDAYS EJF_PER FUNCTCLS RACE.F
0 0 6 0
Linear Regression Model
ols(formula = WHFDAYS ~ EJF_PER + FUNCTCLS + RACE.F, data = dig.data,
x = TRUE, y = TRUE)
Model Likelihood Discrimination
Ratio Test Indexes
Obs 6794 LR chi2 427.20 R2 0.061
sigma509.2601 d.f. 3 R2 adj 0.061
d.f. 6790 Pr(> chi2) 0.0000 g 147.153
Residuals
Min 1Q Median 3Q Max
-1164.57 -463.98 75.44 424.57 1106.88
Coef S.E. t Pr(>|t|)
Intercept 939.6516 31.8212 29.53 <0.0001
EJF_PER 8.6390 0.7101 12.17 <0.0001
FUNCTCLS -124.8406 9.0886 -13.74 <0.0001
RACE.F=Non-White -92.5152 17.5307 -5.28 <0.0001
Model 4 actually fits the general dataset better than the dataset it was fit to with an \(R^2\) of 0.061 as compared to 0.052 of the subsetted data. However, neither of these values is great, so this is likely an artifact of greater scatter in the larger dataset.
plot(calibrate(m4.final))
n=6794 Mean absolute error=9.976 Mean squared error=180.8847
0.9 Quantile of absolute error=17.034
The calibration plot fit to all of the data is more linear than that fit to the subset. The overall fit is quite good except for the region above 1050 days, where the model tends to overpredict.
Predict mortality on the basis of the same variables used in the linear regression model.
plot(spearman2(DEATH ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F + EJF_PER + BMI + CHFDUR + NSYM + HEARTRTE + PREVMI.F + HYPERTEN.F, data = dig.data.subset))
From the Spearman plot, NYHA Functional Class and Ejection Fraction look to be the most important predictors for Mortality. I will add a non-linear term to ejection fraction percentage and I will include the interaction term functional class.
Logistic Regression Model
lrm(formula = DEATH ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F +
rcs(EJF_PER, 3) + BMI + CHFDUR + NSYM + HEARTRTE + PREVMI.F +
HYPERTEN.F, data = dig.data.subset, x = TRUE, y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 984 LR chi2 52.78 R2 0.073 C 0.646
0 666 d.f. 13 g 0.570 Dxy 0.291
1 318 Pr(> chi2) <0.0001 gr 1.768 gamma 0.293
max |deriv| 8e-11 gp 0.121 tau-a 0.128
Brier 0.207
Coef S.E. Wald Z Pr(>|Z|)
Intercept -0.3035 0.9912 -0.31 0.7595
AGE 0.0023 0.0068 0.33 0.7406
FUNCTCLS 0.5036 0.1103 4.57 <0.0001
TRTMT.F=DIG 0.1873 0.1407 1.33 0.1832
DIGDOSE 0.5567 0.9508 0.59 0.5582
RACE.F=Non-White -0.1455 0.1823 -0.80 0.4248
EJF_PER -0.0558 0.0183 -3.05 0.0023
EJF_PER' 0.0392 0.0243 1.61 0.1064
BMI 0.0038 0.0137 0.28 0.7795
CHFDUR -0.0003 0.0019 -0.14 0.8892
NSYM -0.0058 0.0959 -0.06 0.9517
HEARTRTE -0.0083 0.0057 -1.45 0.1474
PREVMI.F=Yes 0.0680 0.1468 0.46 0.6431
HYPERTEN.F=Yes -0.0904 0.1413 -0.64 0.5220
This model includes the same predictors as in the linear regression. In this case we are predicting mortality. The C statistic is low, at 0.646. The \(R^2\) is also very low, 0.073. Of note, only Functional Class and Ejection Fraction are significant. This is not unexpected, as these two variables were also found in the linear regression model. As expected, higher ejection fractions seem to be associated with a lower probability of mortality while the higher functional class (most physical impairment) is associated with greater probabilities of mortality.
Again, functional class and ejection fraction are most important here. However, the anova plot suggests that Heart Rate may also be important in the model.
| Â | Effect | Lower.0.95 | Upper.0.95 |
|---|---|---|---|
| AGE | 1.03 | 0.8653 | 1.226 |
| FUNCTCLS | 1.655 | 1.333 | 2.054 |
| DIGDOSE | 1.321 | 0.5203 | 3.354 |
| EJF_PER | 0.6909 | 0.5533 | 0.8627 |
| BMI | 1.022 | 0.8776 | 1.19 |
| CHFDUR | 0.9906 | 0.8669 | 1.132 |
| NSYM | 0.977 | 0.4608 | 2.072 |
| HEARTRTE | 0.8616 | 0.7043 | 1.054 |
| TRTMT.F | 1.206 | 0.9153 | 1.589 |
| RACE.F | 0.8646 | 0.6048 | 1.236 |
| PREVMI.F | 0.9343 | 0.7007 | 1.246 |
| HYPERTEN.F | 0.9135 | 0.6926 | 1.205 |
From the exponentiated coefficient table, there are significant effects from elevated Functional Class (1.67 times as likely) and higher ejection fraction (0.66 times as likely). Neither treatment nor digoxin dose had significant effects on mortality.
n=984 Mean absolute error=0.019 Mean squared error=0.00075
0.9 Quantile of absolute error=0.04
The model 1 calibration plot is quite nonlinear, suggesting that while the model predicts reasonably well at low probabilities, values above 0.4 are overpredicted.
ggplot(Predict(lrm.model.1, EJF_PER = 0:45, FUNCTCLS, TRTMT.F, fun=plogis)) +
theme_bw() +
labs(x = "Ejection Fraction Percentage",
y = "Pr(Death)",
title = "Model 1 Predictions",
subtitle = "Across Treatment Groups and Functional Class, holding all other predictors at their medians")
Plotting the probability of mortality predictions from Model 1 shows that Functional Class had a significant effect on the probability of mortality within the study period. Digoxin use was also associated with a higher risk of mortality, however the confidence intervals are overlapping and lack significance. Note the limitations of the model: an ejection fraction of 0 should give a mortality of 1 and we do not see that here.
plot(nomogram(lrm.model.1, fun=plogis, funlabel = "Probability of Mortality"))
# Use Backwards elimination to validate
set.seed(314159)
validate(lrm.model.1, bw=TRUE, B=10)
Backwards Step-down - Original Model
Deleted Chi-Sq d.f. P Residual d.f. P AIC
NSYM 0.00 1 0.9517 0.00 1 0.9517 -2.00
CHFDUR 0.02 1 0.8906 0.02 2 0.9888 -3.98
BMI 0.08 1 0.7810 0.10 3 0.9919 -5.90
AGE 0.11 1 0.7377 0.21 4 0.9948 -7.79
PREVMI.F 0.21 1 0.6468 0.42 5 0.9947 -9.58
DIGDOSE 0.33 1 0.5678 0.75 6 0.9934 -11.25
HYPERTEN.F 0.38 1 0.5364 1.13 7 0.9924 -12.87
RACE.F 0.73 1 0.3942 1.86 8 0.9851 -14.14
TRTMT.F 1.75 1 0.1853 3.61 9 0.9351 -14.39
HEARTRTE 1.89 1 0.1691 5.50 10 0.8552 -14.50
EJF_PER 15.76 2 0.0004 21.26 12 0.0467 -2.74
Approximate Estimates after Deleting Factors
Coef S.E. Wald Z P
Intercept -2.1111 0.2733 -7.726 1.110e-14
FUNCTCLS 0.5769 0.1083 5.327 1.001e-07
Factors in Final Model
[1] FUNCTCLS
index.orig training test optimism index.corrected n
Dxy 0.2008 0.2326 0.2131 0.0195 0.1813 10
R2 0.0438 0.0591 0.0494 0.0097 0.0341 10
Intercept 0.0000 0.0000 -0.0856 0.0856 -0.0856 10
Slope 1.0000 1.0000 0.9217 0.0783 0.9217 10
Emax 0.0000 0.0000 0.0335 0.0335 0.0335 10
D 0.0308 0.0421 0.0351 0.0070 0.0238 10
U -0.0020 -0.0020 0.0005 -0.0025 0.0005 10
Q 0.0329 0.0441 0.0346 0.0095 0.0233 10
B 0.2116 0.2067 0.2112 -0.0046 0.2162 10
g 0.4038 0.4694 0.4261 0.0433 0.3605 10
gp 0.0857 0.0986 0.0914 0.0071 0.0785 10
Factors Retained in Backwards Elimination
AGE FUNCTCLS TRTMT.F DIGDOSE RACE.F EJF_PER BMI CHFDUR NSYM HEARTRTE
*
* *
* *
*
* *
* * *
*
* *
* *
PREVMI.F HYPERTEN.F
Frequencies of Numbers of Factors Retained
0 1 2 3
1 3 5 1
The backwards elimination model validation suggested retaining Functional Class and Ejection Fraction Percentage. Age was retained in one of the cases.
\[ C = 0.5 + \frac{D_{xy}}{2} = 0.5 + \frac{0.181}{2} = 0.591 \]
This C statistic is terrible, predicting mortality only slightly better than random chance. Compare this to the value obtained earlier of 0.646. The search for a better model begins!
Create a new model incorporating HEARTRTE, FUNCTCLS, and EJF_PER. I opted to include HEARTRTE instead of AGE despite the backwards validation results as HEARTRTE was almost significant via the anova plot from Model 1.
dd <- datadist(dig.data.subset)
options(datadist = "dd")
model.2 <- lrm(DEATH ~ rcs(EJF_PER, 3)*FUNCTCLS + HEARTRTE,
data = dig.data.subset, x = T, y = T)
model.2
Logistic Regression Model
lrm(formula = DEATH ~ rcs(EJF_PER, 3) * FUNCTCLS + HEARTRTE,
data = dig.data.subset, x = T, y = T)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 984 LR chi2 51.10 R2 0.071 C 0.639
0 666 d.f. 6 g 0.574 Dxy 0.277
1 318 Pr(> chi2) <0.0001 gr 1.775 gamma 0.280
max |deriv| 5e-09 gp 0.120 tau-a 0.122
Brier 0.207
Coef S.E. Wald Z Pr(>|Z|)
Intercept 2.0696 1.7481 1.18 0.2365
EJF_PER -0.1363 0.0736 -1.85 0.0639
EJF_PER' 0.1035 0.0959 1.08 0.2802
FUNCTCLS -0.2693 0.6355 -0.42 0.6718
HEARTRTE -0.0080 0.0057 -1.41 0.1600
EJF_PER * FUNCTCLS 0.0317 0.0280 1.13 0.2578
EJF_PER' * FUNCTCLS -0.0249 0.0377 -0.66 0.5090
The model is quite terrible with an \(R^2\) of 0.071 and a C statistic of 0.639. Only EJF_PER appears significant to the 10 percent level.
plot(anova(model.2))
Here Functional Class and Ejection Fraction Percentage appear significant.
| Â | Effect | Lower.0.95 | Upper.0.95 |
|---|---|---|---|
| EJF_PER | 0.6324 | 0.4843 | 0.8257 |
| FUNCTCLS | 1.832 | 1.32 | 2.543 |
| HEARTRTE | 0.8665 | 0.7095 | 1.058 |
The table above lists the exponentiated coefficients from model 2. Having a higher ejection fraction is associated with a lower mortality (0.63 times as likely). Patients with higher functional classes (more physical impairment) were 1.8 times as likely to die. Higher heart rates were associated with lower mortality (0.87 times as likely.)
n=984 Mean absolute error=0.009 Mean squared error=0.00021
0.9 Quantile of absolute error=0.015
This model is an improvement over the kitchen sink model. While probabilities under 0.2 tend to be underpredicted, the remainder of the data is fit reasonably well.
# Use model 2 to predict mortality
ggplot(Predict(model.2, EJF_PER = 0:45, FUNCTCLS, fun=plogis)) +
theme_bw() +
labs(x = "Ejection Fraction Percentage",
y = "Pr(Death)",
title = "Model 2 Predictions",
subtitle = "Across Functional Class, holding Heart Rate at 80 BPM")
This plot is interesting:
The nomogram suggests that higher heartrates are associated with a lower probability of mortality. Higher ejection fractions were also associated with lower mortality, especially at lower functional classes.
Although neither of these models were great, I will apply the larger subset of the data to model 1, because this dataset is much larger and will likely be harder to overfit. I also suspect that the graphical predictions from Model 2 don’t make physical sense.
Frequencies of Missing Values Due to Each Variable
DEATH AGE FUNCTCLS TRTMT.F DIGDOSE RACE.F
0 0 6 0 1 0
EJF_PER BMI CHFDUR NSYM HEARTRTE PREVMI.F
0 1 14 0 8 1
HYPERTEN.F
1
Logistic Regression Model
lrm(formula = DEATH ~ AGE + FUNCTCLS + TRTMT.F + DIGDOSE + RACE.F +
rcs(EJF_PER, 3) + BMI + CHFDUR + NSYM + HEARTRTE + PREVMI.F +
HYPERTEN.F, data = dig.data, x = TRUE, y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 6768 LR chi2 375.70 R2 0.074 C 0.640
0 4403 d.f. 13 g 0.576 Dxy 0.280
1 2365 Pr(> chi2) <0.0001 gr 1.778 gamma 0.281
max |deriv| 2e-09 gp 0.126 tau-a 0.127
Brier 0.215
Coef S.E. Wald Z Pr(>|Z|)
Intercept -0.4127 0.3586 -1.15 0.2498
AGE -0.0016 0.0024 -0.65 0.5137
FUNCTCLS 0.4716 0.0393 12.01 <0.0001
TRTMT.F=DIG -0.0092 0.0525 -0.17 0.8614
DIGDOSE 0.4994 0.3628 1.38 0.1687
RACE.F=Non-White -0.0639 0.0747 -0.85 0.3927
EJF_PER -0.0441 0.0068 -6.47 <0.0001
EJF_PER' 0.0088 0.0084 1.04 0.2970
BMI -0.0044 0.0051 -0.86 0.3887
CHFDUR -0.0001 0.0007 -0.19 0.8456
NSYM 0.0214 0.0333 0.64 0.5197
HEARTRTE -0.0010 0.0021 -0.47 0.6396
PREVMI.F=Yes 0.0378 0.0551 0.69 0.4928
HYPERTEN.F=Yes 0.0041 0.0527 0.08 0.9378
The \(R^2\) statistic (0.074) is marginally better while the C statistic (0.64) is slightly worse. Again, Ejection Fraction and Functional Class appear to be the most important predictors.
| Â | Effect | Lower.0.95 | Upper.0.95 |
|---|---|---|---|
| AGE | 0.9783 | 0.916 | 1.045 |
| FUNCTCLS | 1.602 | 1.484 | 1.731 |
| DIGDOSE | 1.284 | 0.8996 | 1.832 |
| EJF_PER | 0.6153 | 0.5693 | 0.6651 |
| BMI | 0.9736 | 0.9161 | 1.035 |
| CHFDUR | 0.9949 | 0.9451 | 1.047 |
| NSYM | 1.089 | 0.8393 | 1.414 |
| HEARTRTE | 0.9827 | 0.9133 | 1.057 |
| TRTMT.F | 0.9909 | 0.894 | 1.098 |
| RACE.F | 0.9381 | 0.8103 | 1.086 |
| PREVMI.F | 0.9629 | 0.8644 | 1.073 |
| HYPERTEN.F | 1.004 | 0.9056 | 1.113 |
These coefficients are actually quite similar to what was obtained earlier with the subsetted data; all of the confidence intervals overlap.
n=6768 Mean absolute error=0.006 Mean squared error=7e-05
0.9 Quantile of absolute error=0.013
The calibration plot is much more linear, however we have a lot more data. The model appears to predict reaonsably well at all probabilities.
ggplot(Predict(lrm.model.1.final, EJF_PER = 0:45, FUNCTCLS, TRTMT.F, fun=plogis)) +
theme_bw() +
labs(x = "Ejection Fraction Percentage",
y = "Pr(Death)",
title = "Model 1 Predictions",
subtitle = "Across Treatment Groups and Functional Class, holding all other predictors at their medians")
This plot is more inline with what I would have expected; patients in higher functional classes have a higher probability of mortality at all ejection fractions. However, the plotted predictions appear more linear than exponential. Of note, there appears to be a significant difference between the Digoxin and the Placebo groups; this was not evident in the subsetted data.
| Model | C | R2 |
|---|---|---|
| Model 1 | 0.646 | 0.073 |
| Model 2 | 0.639 | 0.071 |
| Gen Model 1 | 0.640 | 0.074 |